Skip to content

Improvements to show, and new show from disc#4911

Open
mariaKt wants to merge 6 commits intodevelopfrom
mk/show-from-disk-2
Open

Improvements to show, and new show from disc#4911
mariaKt wants to merge 6 commits intodevelopfrom
mk/show-from-disk-2

Conversation

@mariaKt
Copy link
Copy Markdown
Collaborator

@mariaKt mariaKt commented Apr 20, 2026

Memory-efficient proof display: streaming show and lazy loading

Motivation

Displaying large proofs with kmir show caused out-of-memory failures. A completed 1411-node proof required 23.8 GB of memory just to print its output, after the proof itself had already finished successfully.

Investigation

The initial hypothesis was that the memory cost came from joining the entire output into a single string before printing ('\n'.join(lines)). Streaming the output line-by-line (via a generator) eliminated this duplication, but peak memory remained at 23.8 GB. The real cost was loading the entire proof into memory, parsing all node CTerms and constraint data from JSON into Python K-term objects, with ~6× memory overhead from Python object representation.

Changes

This PR introduces two improvements:

1. Generator-based show_iter

KCFGShow.show_iter and APRProofShow.show_iter yield lines one at a time instead of collecting them into a list. The existing show methods are reimplemented as list(self.show_iter(...)) to avoid code duplication. Tests confirm same output.

2. Lazy-loading show_iter_from_disk

APRProofShow.show_iter_from_disk displays a proof without loading the full proof into memory:

  • LazyNode — duck-types KCFG.Node. Stores node ID, attrs, and a file path. Loads the CTerm from nodes/{id}.json only when .cterm is accessed for printing.
  • LazyCSubst — duck-types CSubst. Holds the raw JSON dict and defers parsing into K-term objects until .constraints or .subst is accessed.
  • APRProofStub — duck-types APRProof. Holds proof metadata (init, target, terminal, bounded, refutations) from proof.json and answers proof-level queries (is_init, is_target, is_terminal, etc.) without loading the KCFG.

These stubs are passed into the real KCFG data structures (edges, splits, covers, ndbranches accept them via duck typing) and the existing pretty_segments traversal works unchanged, with no reimplementation or duplication of the tree-drawing logic.

Loading is integrated into the existing code path: KCFG.from_dict accepts a lazy parameter that controls whether nodes are created as LazyNode stubs and covers use LazyCSubst wrappers. KCFGStore.read_cfg_data_lazy reads kcfg.json without loading node JSON files. KCFG.read_cfg_data_lazy provides the entry point, combining the two.

Results

Tested on a 1411-node p-token proof (burn_multisig_n1):

show_iter (full load) show_iter_from_disk (lazy)
Peak memory 23.8 GB 6.3 GB
Time 4263s (~71 min) 315s (~5 min)
Output lines 10,221 10,221

74% memory reduction, 13.5× faster, identical output.

Testing

  • show_iter_from_disk output is compared against show output in three existing test_imp.py test functions (all tests that write proofs to disk). Both default and full-printer modes are exercised.
  • Manually verified output match on external proofs with APRProofNodePrinter (proof-level attrs: init, target, terminal).
  • All 98 existing pyk show-related tests pass.

@mariaKt mariaKt marked this pull request as ready for review April 21, 2026 17:37
Comment thread pyk/src/pyk/kcfg/lazy.py
Comment on lines +44 to +47
def evict(self) -> None:
"""Release the loaded CTerm from memory."""
self._cterm = None

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually see where this is used, how does this work? I assume somehow this gets called whenever we're done printing a given node?

Comment thread pyk/src/pyk/kcfg/lazy.py
Comment on lines +48 to +52
def __eq__(self, other: object) -> bool:
if isinstance(other, LazyNode):
return self.id == other.id
# Also compare with real KCFG.Node
return hasattr(other, 'id') and self.id == other.id
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be equivalent anyway?

Suggested change
def __eq__(self, other: object) -> bool:
if isinstance(other, LazyNode):
return self.id == other.id
# Also compare with real KCFG.Node
return hasattr(other, 'id') and self.id == other.id
def __eq__(self, other: object) -> bool:
return hasattr(other, 'id') and self.id == other.id

Or perhaps the tighter:

Suggested change
def __eq__(self, other: object) -> bool:
if isinstance(other, LazyNode):
return self.id == other.id
# Also compare with real KCFG.Node
return hasattr(other, 'id') and self.id == other.id
def __eq__(self, other: object) -> bool:
if isinstance(other, LazyNode):
return self.id == other.id
# Also compare with real KCFG.Node
return isinstance(other, KCFG.Node) and self.id == other.id

@ehildenb
Copy link
Copy Markdown
Member

ehildenb commented Apr 22, 2026

Does it make sense to just have lazy-loading be the default anyway, and build it into the main KCFG class?

@ehildenb
Copy link
Copy Markdown
Member

@claude can you review this PR? And check if evict is actually used, and whether it makes sense to just have lazy loading be the default for KCFGs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants